Target transfer Q-learning and its convergence analysis
نویسندگان
چکیده
منابع مشابه
Fastest Convergence for Q-learning
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins’ original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-s...
متن کاملConvergence of Optimistic and Incremental Q-Learning
Yishay Mansourt Vie sho,v the convergence of tV/O deterministic variants of Qlearning. The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an Eoptimal policy. The second is a new and novel algo...
متن کاملThe Asymptotic Convergence-Rate of Q-learning
In this paper we show that for discounted MDPs with discount factor, > 1/2 the asymptotic rate of convergence of Q-Iearning is O(1/tR(1-1') if R(1 ,) < 1/2 and O( Jlog log tit) otherwise provided that the state-action pairs are sampled from a fixed probability distribution. Here R = Pmin/Pmax is the ratio of the minimum and maximum state-action occupation frequencies. The results extend to conv...
متن کاملOn Convergence of q-Homotopy Analysis Method
The convergence of qhomotopy analysis method (q-HAM) is studied in the present paper. It is proven that under certain conditions the solution of the equation: 1 ∅ , ∅ , 0 associated with the original problem exists as a power series in .So,under a special constraint the q-homotopy analysis method does converge to the exact solution of nonlinear problems. An error estimate is also provided. The ...
متن کاملFinite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
In this paper, we address two issues of long-standing interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning after only a nite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for o -line value ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Neurocomputing
سال: 2020
ISSN: 0925-2312
DOI: 10.1016/j.neucom.2020.02.117